The market for Indian food is significant and growing in San Jose, California. Drawn to IT and other vocations pertaining to opportunities in technology, around ten percent of the people living in San Jose are Indian. Furthermore, Indian food is very tasty and can be appreciated by people of any ethnicity. Therefore demand for Indian food will continue to grow and there exists, in addition to the profit motive, an unparalelled opportunity to promote authentic Indian cuisine in the bay area.
Having acknowledged the opportunity that exists in the bay area, we can see that many possible locations exist where an entrepreneur can start an Indian restaurant. There exist many good locations in San Francisco, Oakland, and San Jose. For the purposes of this Data Science initiative however, we will restrict ourselves to potential venues in San Jose. This is a good restriction because the cost of opening a business is lower in San Jose as opposed to San Francisco. Furthermore, unlike Oakland, San Jose has a higher percentage of Indian people living there. Ultimately, by seeing the efficacy of out methods in San Jose, we can use similar techniques to decide upon the location of future restaurants in other Cities.
We will optimize our locations according to two criteria. First, we want to make sure there are few restaurants nearby so our new business will have geographic prominence. Once we do this, we will want to pay special attention to areas with no Indian restaurants within the proximity of our selected location. Finally, once these two conditions are met, we want to find locations as close as possible to downtown San Jose, an area with a higher concentration of tourists. More people are also likely to spontaneously discover our restaurant in downtown San Jose when exploring or just out for a night of fun.
We will then use the relevant python libraries to identify the desirable areas and remark on the relative advantages and disadvantages. This way, potential entrepreneurs have the ability to make use of a computational as well as human perspective (good up-and-coming neighbourhood) when deciding to invest a large sum of money on a risky venture.
Considering the problem definition, lets identify the relevant features that will influence our estimate of a prospective venue's desirability :
We construct a lattice centered around Downtown San Jose. The points on the lattice will represent the potential neighbourhoods. We will obtain the data from the following sources to obtain the relevant information.
We will create latitude and longitude coordinates for centroids of the potential neighbourhoods for our restaurant venues. This lattice will cover approximately 55 square miles and is centered around Downtown San Jose. Lets use Google Maps API to find the coordinates of downtown San Jose.
#Removable Code with API credentials
google_api_key = ''
address = '200 E Santa Clara St, San Jose, CA 95113' #city hall
import requests
#function to extract latitude and longitude from results json
def get_coordinates(api_key, address, verbose=False):
try:
url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
response = requests.get(url).json()
if verbose:
print('Google Maps API JSON result =>', response)
results = response['results']
geographical_data = results[0]['geometry']['location'] # get geographical coordinates
lat = geographical_data['lat']
lon = geographical_data['lng']
return [lat, lon]
except:
return [None, None]
cityHall_sj = get_coordinates(google_api_key, address) #cityHall coordinates
# should return coordinates of city hall [37.3380937, -121.8853892]
print(cityHall_sj)
These candidate areas are located within approximately 4 miles of city hall, giving a good spread of the city of San Jose. Having found the coordinates of City Hall, we now have a good point to generate our lattice. The points of this lattice, representing our candidate neighbourhoods, will all be 600 meters apart. Note, this means that the "neighbourhood" will include everything within a radius of 300 meters of the center.
Since we need to calculate the distance from city hall as part of our analysis, it is neccesary to shift from spherical coordinates to cartesian coordinates (2D). This will allow us to easily measure and tabulate all the distances in miles. Subsequently, we can also convert back from cartesian to spherical coordinates (latitude/longitude) for our Folium map. We will use the pyproj and shapely libraries to write our helper functions for coordinate transformation.
import shapely.geometry
import pyproj
import math
#define utiilities for coordinate transformations and finding cartesian distance
def s2c(lon,lat): #spherical to cartesian transform (lonlat_to_xy)
proj_sphr = pyproj.Proj(proj='latlong', datum='WGS84') #spherical projection
proj_cart = pyproj.Proj(proj="utm", zone=33, datum='WGS84') #cartesian projection
cart = pyproj.transform(proj_sphr, proj_cart, lon, lat)
return cart[0], cart[1]
def c2s(x, y): #cartesian to spherical transform (xy_to_lonlat)
proj_sphr = pyproj.Proj(proj='latlong', datum='WGS84') #spherical projection
proj_cart = pyproj.Proj(proj="utm", zone=33, datum='WGS84') #cartesian projection
sphr = pyproj.transform(proj_cart, proj_sphr, x, y)
return sphr[0], sphr[1]
def calc_xy_distance(x1, y1, x2, y2): #returns distance between two points
dx = x2 - x1
dy = y2 - y1
return math.sqrt(dx*dx + dy*dy)
# If the functions work as expected the transformnations will correctly
# lon, lat will have the same values as cityHall_sj[0:2]
print('Coordinate transformation check')
print('-------------------------------')
print('San Jose City Hall longitude={}, latitude={}'.format(cityHall_sj[1], cityHall_sj[0]))
x, y = s2c(cityHall_sj[1],cityHall_sj[0]) #test function on city hall coordinates
print('San Jose City Hall UTM X={}, Y={}'.format(x, y))
lon, lat = c2s(x, y) #convert back to spherical coordinates
print('San Jose City Hall longitude={}, latitude={}'.format(lon, lat))
#success!!!...?
Wunderbar! The coordinatate tranformation proceeded without a hitch. Now, we can construct a hexagonal lattice. To do so, we offset every other row, and adjust vertical row spacing so that each vertex is equidistant from all its neighbours.
cityHall_sj_x, cityHall_sj_y = s2c(cityHall_sj[1], cityHall_sj[0]) # Cartesian
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = cityHall_sj_x - 6000
x_step = 600
y_min = cityHall_sj_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k
latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
y = y_min + i * y_step
x_offset = 300 if i%2==0 else 0
for j in range(0, 21):
x = x_min + j * x_step + x_offset
distance_from_center = calc_xy_distance(cityHall_sj_x, cityHall_sj_y, x, y)
if (distance_from_center <= 6001):
lon, lat = c2s(x, y)
latitudes.append(lat)
longitudes.append(lon)
distances_from_center.append(distance_from_center)
xs.append(x)
ys.append(y)
print(len(latitudes), 'candidate neighbourhood centers generated')
Folium will help us visualize the data clues we have so far pertaining the city hall coordinates and the neighbourhood centroids.
import folium
map_sj = folium.Map(location=cityHall_sj, zoom_start=13)
folium.Marker(cityHall_sj, popup='Downtown SJ').add_to(map_sj)
for lat, lon in zip(latitudes, longitudes):
folium.Circle([lat, lon], radius=250, color='blue', fill=False).add_to(map_sj)
map_sj
Each of these Neighbourhoods is equidistant from each other and all are within 6km (approximately 3.75 mile) radius around City Hall.
Having generated coordinate pairs for the vertices of our hexagonal lattice, we proceed to use the Google Maps Geocoder API to get addresses for each of these vertices.
# function to find approximate address of a neighbourhood given latitude and longitude
def get_address(api_key, latitude, longitude, verbose=False):
try:
url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude,longitude)
response = requests.get(url).json()
if verbose:
print('Google Maps API JSON results =>', response)
results = response['results']
address = results[0]['formatted_address']
return address
except:
return None
#test reverse geocoding on city hall coordinates
addr = get_address(google_api_key,cityHall_sj[0],cityHall_sj[1])
print('Reverse geocoding test')
print('______________________')
print('Address of [{},{}] is: {}'.format(cityHall_sj[0], cityHall_sj[1], addr))
# use get_address() to find addresses for our neighbourhoods
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
address = get_address(google_api_key, lat, lon)
if address is None:
address = 'NO ADDRESS'
address = address.replace(', USA', '') # country name is unneccesary
addresses.append(address)
print(' .', end='')
print(' done.')
addresses[85:100]
Alright. These look like good example addresses in San Jose. We can move on and process these data by organizing them within a Pandas dataframe.
import pandas as pd
#locations dataframe
df_loc = pd.DataFrame({'Address': addresses,
'Latitude': latitudes,
'Longitude': longitudes,
'X': xs,
'Y': ys,
'Distance': distances_from_center}) #distance from City Hall
df_loc.head(10)
df_loc.to_pickle('./indian_locations.pkl')
Subsequent to the extraction and tabulation of our neighbourhood addresses proximal to City Hall, we can use the Foursquare API to learn about the venues within the boundaries of each neighbourhood.
We will extract the category IDs corresponding to Indian restaurants from the Foursquare website. Note, the underlying broad category is "food", having its own code. Finally, we can scan the names of the venues to see if the word "restaurant" is found in the name. Combined with the restaurant category code for an Indian restaurant, we can accurately determine both the number of restaurants as well as the number of Indian restaurants in each neighborhood, which will enable us to predict the prospective value of each location.
#jupyter reset importation queue
import pandas as pd
import pickle
import requests
from bs4 import BeautifulSoup
import re
#try to load from local system if we tried this before
df_loc = pd.read_pickle("./indian_locations.pkl")
df_loc.head()
# the columns have been extracted as lists for later use
addresses = list(df_loc['Address'])
latitudes = list(df_loc['Latitude'])
longitudes = list(df_loc['Longitude'])
xs = list(df_loc['X'])
ys = list(df_loc['Y'])
distances_from_center = list(df_loc['Distance'])
Great! Having restored our data from pickle, we can hide the foursquare credentials in a specific cell below. Then we can pull and arrange the category IDs and define helper functions to manipulate the addresses and access the API.
# Foursquare client credentials
# We can use BeautifulSoup to automagically extract IDs, avoiding tedious copypasta
URL = 'https://developer.foursquare.com/docs/build-with-foursquare/categories/'
content = requests.get(URL)
soup = BeautifulSoup(content.text, 'html.parser')
Foursquare's organized developer site has been transformed into an even more organized soup object. We can query this object according to the appropriate tags to easily find the part of the source containing the restaurant names and IDs.
row = soup.find_all('li')
split_index = []
txt = [row[i].get_text() for i in range(317,344)] #information for indian cuisine
for p in range(len(txt)): #nested loops to find where venue name ends and id begins
for q in range(len(txt[p])):
if txt[p][q].isnumeric():
split_index.append(q)
break
print(split_index)
txtVenue = [e[0][:e[1]] for e in zip(txt, split_index)]
print(txtVenue)
idVenue = [e[0][e[1]:e[1]+24] for e in zip(txt, split_index)]
print(idVenue)
# These IDs have been pulled from the Foursquare site <https://developer.foursquare.com>
food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues
indian_restaurant_categories = idVenue #pulled from Foursquare developer page
def is_restaurant(categories, specific_filter=None):
restaurant_words = ['restaurant', 'place', 'chaat', 'tandoori']
restaurant = False
specific = False
for c in categories:
category_name = c[0].lower()
category_id = c[1]
for r in restaurant_words:
if r in category_name:
restaurant = True
if 'fast food' in category_name:
restaurant = False
if not(specific_filter is None) and (category_id in specific_filter):
specific = True
restaurant = True
return restaurant, specific
def get_categories(categories):
return [(cat['name'], cat['id']) for cat in categories]
def format_address(location):
address = ', '.join(location['formattedAddress'])
address = address.replace(', USA', '')
address = address.replace(', US', '')
return address
def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
version = '20180724'
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
client_id, client_secret, version, lat, lon, category, radius, limit)
try:
results = requests.get(url).json()['response']['groups'][0]['items']
venues = [(item['venue']['id'],
item['venue']['name'],
get_categories(item['venue']['categories']),
(item['venue']['location']['lat'], item['venue']['location']['lng']),
format_address(item['venue']['location']),
item['venue']['location']['distance']) for item in results]
except:
venues = []
return venues
# function to extract proximal venues given neighbourhood coordinates as input
# also create an associative array of all found restaurants and all found Indian restaurants
def get_restaurants(lats, lons):
restaurants = {}
indian_restaurants = {}
location_restaurants = []
print('Obtaining venues around candidate locations:', end='')
for lat, lon in zip(lats, lons):
# Using radius=325 to meke sure we have complete coverage & don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
venues = get_venues_near_location(lat, lon, food_category, foursquare_client_id, foursquare_client_secret, radius=325, limit=100)
area_restaurants = []
for venue in venues:
venue_id = venue[0]
venue_name = venue[1]
venue_categories = venue[2]
venue_latlon = venue[3]
venue_address = venue[4]
venue_distance = venue[5]
is_res, is_indian = is_restaurant(venue_categories, specific_filter=indian_restaurant_categories)
if is_res:
x, y = s2c(venue_latlon[1], venue_latlon[0])
restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_indian, x, y)
if venue_distance<=300:
area_restaurants.append(restaurant)
restaurants[venue_id] = restaurant
if is_indian:
indian_restaurants[venue_id] = restaurant
location_restaurants.append(area_restaurants)
print(' .', end='')
print(' done.')
return restaurants, indian_restaurants, location_restaurants
# Try to load from local file system in case we did this before
#remember!!! we changed Indian to 325m not 350m
restaurants = {}
indian_restaurants = {}
location_restaurants = []
loaded = False
try: #Indian cuisine
with open('restaurants_325.pkl', 'rb') as f:
restaurants = pickle.load(f)
with open('indian_restaurants_325.pkl', 'rb') as f:
indian_restaurants = pickle.load(f)
with open('location_restaurants_325.pkl', 'rb') as f:
location_restaurants = pickle.load(f)
print('Restaurant data loaded.')
loaded = True
except:
pass
# If load failed use the Foursquare API to get the data
if not loaded:
restaurants, indian_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
# Let's persists this in local file system
with open('restaurants_325.pkl', 'wb') as f:
pickle.dump(restaurants, f)
with open('indian_restaurants_325.pkl', 'wb') as f:
pickle.dump(indian_restaurants, f)
with open('location_restaurants_325.pkl', 'wb') as f:
pickle.dump(location_restaurants, f)
import numpy as np
print('Total number of restaurants:', len(restaurants))
print('Total number of Indian restaurants:', len(indian_restaurants))
print('Percentage of Indian restaurants: {:.2f}%'.format(len(indian_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:10]:
print(r)
print('...')
print('Total:', len(restaurants))
print('List of Indian restaurants')
print('---------------------------')
for r in list(indian_restaurants.values())[:10]:
print(r)
print('...')
print('Total:', len(indian_restaurants))
print('Restaurants around location')
print('---------------------------')
for i in range(100, 110):
rs = location_restaurants[i][:8]
names = ', '.join([r[1] for r in rs])
print('Restaurants around location {}: {}'.format(i+1, names))
map_sj = folium.Map(location=cityHall_sj, zoom_start=13)
folium.Marker(cityHall_sj, popup='City Hall').add_to(map_sj)
for res in restaurants.values():
lat = res[2]; lon = res[3]
is_indian = res[6]
color = 'red' if is_indian else 'blue'
folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_sj)
map_sj
This looks good! We collected a sweep of all restaurants in the vicinity of City Hall. Furthermore, we also have a good idea where the Indian restaurants are. And we know which area have a high and low concentration of restaurants and Indian restaurants.
Now that we have collected the data, its key that we use it for precise analysis to generate ideal locations for new Indian restaurants.
Here, we will focus on finding areas of San Jose that have low density of restaurants, particularly Indian restaurants. We will limit our analysis to 3.75 miles around City Hall.
Initially, we collect the required data : location and type of every restaurant in the proximity of city hall (a roughly 4 mile radius sweep centered at downtown San Jose). We also used the relevant Foursquare codes to find Indian restaurants.
Now, we are ready to create visualisations of density pertaining restaurants. Using Folium and other tools, we will take advantage of the visual appeal of heatmaps and the intuitive power and simplicity of k-means clustering to identify ideal neighbourhoods that would serve as a good starting point for a potential entrepreneur/restauranteur looking to get into the business of catering Indian cuisine.
We will make sure that the areas we are looking at have no more than two restaurants within a radius of 250 meters and zero Indian restaurants within a radius of 400 meters.
It is time to perform introductory explanatory data analysis and provide the derivation of additional info to expand our venue location dataframe. We will begin by counting the number of restaurants in every area we are potentially evaluating.
location_restaurants_count = [len(res) for res in location_restaurants]
df_loc['Restaurants in area'] = location_restaurants_count
print('Average number of restaurants in every area with radius=300m:', np.array(location_restaurants_count).mean())
df_loc.head(10)
Alright, we can now add a column indicating the distance to the nearest indian restaurant from each neighbourhood centroid. Note, that we are choosing the absolute nearest one, if there are multiple restaurants.
distances_to_indian_restaurant = []
for area_x, area_y in zip(xs, ys):
min_distance = 10000
for res in indian_restaurants.values():
res_x = res[7]
res_y = res[8]
d = calc_xy_distance(area_x, area_y, res_x, res_y)
if d<min_distance:
min_distance = d
distances_to_indian_restaurant.append(min_distance)
df_loc['Distance to Indian restaurant'] = distances_to_indian_restaurant
df_loc.head()
print('Average distance to closest Indian restaurant from each area center:', df_loc['Distance to Indian restaurant'].mean())
Note that 1500km is slightly under a mile, which would be 1609km. Hence we can take a roughly broad sweep as Foursquare is selective when it comes to classifying an "Indian" restaurant.
We can now create a heatmap and see if we can show the borders of San Jose Neighbourhoods on the map and create circles indicating how far away we are from City Hall.
sj_district_url = 'https://opendata.arcgis.com/datasets/001373893c8347d4b36cf15a6103f78c_120.geojson'
sj_district = requests.get(sj_district_url).json()
def district_style(feature):
return { 'color' : 'blue', 'fill' : False}
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]
indian_latlons = [[res[2], res[3]] for res in indian_restaurants.values()]
from folium import plugins
from folium.plugins import HeatMap
map_sj = folium.Map(location=cityHall_sj, zoom_start=13)
folium.TileLayer('cartodbdark_matter').add_to(map_sj) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_sj)
folium.Marker(cityHall_sj).add_to(map_sj)
folium.Circle(cityHall_sj, radius=1000, fill=False, color='white').add_to(map_sj)
folium.Circle(cityHall_sj, radius=2000, fill=False, color='white').add_to(map_sj)
folium.Circle(cityHall_sj, radius=3000, fill=False, color='white').add_to(map_sj)
folium.GeoJson(sj_district, style_function=district_style, name='geojson').add_to(map_sj)
map_sj
Preliminary observation of our heatmap shows that there are pockets of low density to the north and South-East of city hall. We can create another heatmap showing Indian restaurants only.
map_sj = folium.Map(location = cityHall_sj, zoom_start=13)
folium.TileLayer('cartodbdark_matter').add_to(map_sj)
HeatMap(indian_latlons).add_to(map_sj)
folium.Marker(cityHall_sj).add_to(map_sj)
folium.Circle(cityHall_sj, radius=1000, fill=False, color='white').add_to(map_sj)
folium.Circle(cityHall_sj, radius=2000, fill=False, color='white').add_to(map_sj)
folium.Circle(cityHall_sj, radius=3000, fill=False, color='white').add_to(map_sj)
folium.GeoJson(sj_district, style_function=district_style, name='geojson').add_to(map_sj)
map_sj
This map is much more sparse due to the lower prevalence of Indian restaurants in San Jose. However, we see that South San Jose and East San Jose have pockets of low Indian restaurant density.
Towards the South-West, we have little Italy, and towards the South-East we have little Saigon. We surmise that it although there are opportunities to open here, the innate nature of these neighbourhoods would not be compatible with an Indian restaurant. On the other hand, towards the South, the neighbourhoods of Washington and Spartan Keyes are ideal location candidates.
Preliminary analysis of these Neighbourhoods show a large chicana demographic. Washington has prominence as a historic district and Spartan Keyes is a notable neighbourhood of artists, art studios, and galleries. Furthermore, it is home to the south campus of San Jose State University and is also a historic neighbourhood.
"Many former warehouses and factories have been converted into art studios and galleries."
Popular with tourists, artists, and Students, these neighbourhoods justify further analysis. We can define a newer, narrow region of interest here, and find the low-restaurant parts in these areas.
roi_x_min = cityHall_sj_x - 4000
roi_y_max = cityHall_sj_y + 4000
roi_width = 5000
roi_height = 5000
roi_center_x = roi_x_min + 2500
roi_center_y = roi_y_max - 2500
roi_center_lon, roi_center_lat = c2s(roi_center_x, roi_center_y)
roi_center = [roi_center_lat, roi_center_lon]
map_sj = folium.Map(location=roi_center, zoom_start=14)
HeatMap(restaurant_latlons).add_to(map_sj)
folium.Marker(cityHall_sj).add_to(map_sj)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_sj)
folium.GeoJson(sj_district, style_function=district_style, name='geojson').add_to(map_sj)
map_sj
Alright, this cross section covers the pockets within Washington / Spartan Keyes that are close to city Hall. We can create a new dense grid location candidates that are restricted to this region.
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 100
y_step = 100 * k
roi_y_min = roi_center_y - 2500
roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
y = roi_y_min + i * y_step
x_offset = 50 if i%2==0 else 0
for j in range(0, 51):
x = roi_x_min + j * x_step + x_offset
d = calc_xy_distance(roi_center_x, roi_center_y, x, y)
if (d <= 2501):
lon, lat = c2s(x, y)
roi_latitudes.append(lat)
roi_longitudes.append(lon)
roi_xs.append(x)
roi_ys.append(y)
print(len(roi_latitudes), 'candidate neighborhood centers generated.')
Now, we can evaluate the candidate areas by determining the quantity of restaurants nearby and distance to nearest Indian restaurant .
def count_restaurants_nearby(x, y, restaurants, radius=250):
count = 0
for res in restaurants.values():
res_x = res[7]; res_y = res[8]
d = calc_xy_distance(x, y, res_x, res_y)
if d<=radius:
count += 1
return count
def find_nearest_restaurant(x, y, restaurants):
d_min = 100000
for res in restaurants.values():
res_x = res[7]; res_y = res[8]
d = calc_xy_distance(x, y, res_x, res_y)
if d<=d_min:
d_min = d
return d_min
roi_restaurant_counts = []
roi_indian_distances = []
print('Generating data on location candidates... ', end='')
for x, y in zip(roi_xs, roi_ys):
count = count_restaurants_nearby(x, y, restaurants, radius=250)
roi_restaurant_counts.append(count)
distance = find_nearest_restaurant(x, y, indian_restaurants)
roi_indian_distances.append(distance)
print('done.')
# Let's put this into dataframe
df_roi_locations = pd.DataFrame({'Latitude':roi_latitudes,
'Longitude':roi_longitudes,
'X':roi_xs,
'Y':roi_ys,
'Restaurants nearby':roi_restaurant_counts,
'Distance to Indian restaurant':roi_indian_distances})
df_roi_locations.head(10)
The dataframe looks good. We can now filter the locations. We want to identify areas where there are no restaurants within 250m and no Indian Restaurants within 800m.
good_res_count = np.array((df_roi_locations['Restaurants nearby']<=2))
print('Locations with no more than two restaurants nearby:', good_res_count.sum())
good_ind_distance = np.array(df_roi_locations['Distance to Indian restaurant']>=800)
print('Locations with no Indian restaurants within 800m:', good_ind_distance.sum())
good_locations = np.logical_and(good_res_count, good_ind_distance)
print('Locations with both conditions met:', good_locations.sum())
df_good_locations = df_roi_locations[good_locations]
Time to plot this on a heatmap.
good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values
good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]
map_sj = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_sj)
HeatMap(restaurant_latlons).add_to(map_sj)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.6).add_to(map_sj)
folium.Marker(cityHall_sj).add_to(map_sj)
for lat, lon in zip(good_latitudes, good_longitudes):
folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_sj)
folium.GeoJson(sj_district, style_function=district_style, name='geojson').add_to(map_sj)
map_sj
Good. The areas in Washington-Guadalupe and Spartan Keyes suitable for development are identified. We know that there are no more than two restaurants nearby and no Indian restaurants within 800m. Any location here is good based on nearby competition.
We can indicate these locations on a heatmap:
map_sj = folium.Map(location=roi_center, zoom_start=14)
HeatMap(good_locations, radius=25).add_to(map_sj)
folium.Marker(cityHall_sj).add_to(map_sj)
for lat, lon in zip(good_latitudes, good_longitudes):
folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_sj)
folium.GeoJson(sj_district, style_function=district_style, name='geojson').add_to(map_sj)
map_sj
We now have a clear indication of the zones where there are few restaurants nearby and no Indian restaurants nearby. We can now use k-means clustering to find the relevant centroids and calculate addresses for the final result of our analysis.
from sklearn.cluster import KMeans
number_of_clusters = 10
good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)
cluster_centers = [c2s(cc[0], cc[1]) for cc in kmeans.cluster_centers_]
map_sj = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_sj)
HeatMap(restaurant_latlons).add_to(map_sj)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_sj)
folium.Marker(cityHall_sj).add_to(map_sj)
for lon, lat in cluster_centers:
folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.25).add_to(map_sj)
for lat, lon in zip(good_latitudes, good_longitudes):
folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_sj)
folium.GeoJson(sj_district, style_function=district_style, name='geojson').add_to(map_sj)
map_sj
The clusters cover almost all of the candidate area reasonably well and do a decent job of indicating the zones with the most valid areas. The centroids are well placed in the appropriate zones.
We can find the addresses of these areas and assume them to be good approximations for starting points for neighbourhoods to find the best possible location on specifics.
We can also observe the zones without a heatmap and use shading to indicate area.
map_sj = folium.Map(location=roi_center, zoom_start=14)
folium.Marker(cityHall_sj).add_to(map_sj)
for lat, lon in zip(good_latitudes, good_longitudes):
folium.Circle([lat, lon], radius=250, color='#00000000', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_sj)
for lat, lon in zip(good_latitudes, good_longitudes):
folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_sj)
for lon, lat in cluster_centers:
folium.Circle([lat, lon], radius=500, color='green', fill=False).add_to(map_sj)
folium.GeoJson(sj_district, style_function=district_style, name='geojson').add_to(map_sj)
map_sj
Now, we can reverse geocode these areas to get addresses which are suitable for development.
candidate_area_addresses = []
print('==============================================================')
print('Addresses of centers of areas recommended for further analysis')
print('==============================================================\n')
for lon, lat in cluster_centers:
addr = get_address(google_api_key, lat, lon).replace(', Germany', '')
candidate_area_addresses.append(addr)
x, y = s2c(lon, lat)
d = calc_xy_distance(x, y, cityHall_sj_x, cityHall_sj_y)
print('{}{} => {:.1f}km from City Hall'.format(addr, ' '*(50-len(addr)), d/1000))
Our analysis is done. We generated 10 addresses indicating zones where there are low restaurants and no Indian restaurants. We know these zones are close to city Hall. These addresses should be considered as a starting point for further research, and only serve as a rough approximation of zones of potential addresses. These locations are interesting due to the presence of tourists, as well as a large artist/student community, while being close to downtown San Jose.
map_sj = folium.Map(location=roi_center, zoom_start=14)
folium.Circle(cityHall_sj, radius=50, color='red', fill=True, fill_color='red', fill_opacity=1).add_to(map_sj)
for lonlat, addr in zip(cluster_centers, candidate_area_addresses):
folium.Marker([lonlat[1], lonlat[0]], popup=addr).add_to(map_sj)
for lat, lon in zip(good_latitudes, good_longitudes):
folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.05).add_to(map_sj)
map_sj
The analysis shows that despite the large number of restaurants in San Jose, there are pockets of low density somewhat close to city Hall. Most Indian restaurants were north of San Jose, hence we focused most of our attention on the borough of Spartan Keyes, to the South of city hall. Due to the community of artists, students, and tourists, along with recent development of buildings from commercial to residential, this serves as an ideal neighbourhood to start an Indian restaurants. Another interesting borough was Washington-Guadalupe, however we focused most of our attentions on Spartan Keyes.
Subsequently, we created a closely spaced grid of location candidates and filtered for those with few restaurants nearby and no Indian restaurants nearby. Then, after clustering these zones, we used reverse geocoding to find approximate addresses as starting points for more detailed local analysis based on other factors.
This results in 10 zones containing the greatest potential new restaurant locations based on the parameters we filtered for. Naturally, these are not all optimal locations. There could easily exist other reasons to invalidate these locations. Furthermore, additional locations that are not in this area could also be excellent candidates for restaurants (close to the highway but far from city hall). The techniques used in this project only serve to illustrate one possible way for identifying desirable locations for a new venue.
The purpose of this project was to identify San Jose areas close to City Hall with a low number of restaurants (particularly Indian restaurants) to aid entrepreneurs, investors, speculators, and restauranteurs in narrowing down the search for optimal locations for a new Indian restaurants. By calculating restaurant density distribution from Foursquare data, we have identified general neighbourhoods that justify further analysis (Washington-Guadalupe and Spartan Keyes), then generate extensive collection of locations satisfying basic reqirements regarding nearby venues. Clustering these locations helps to identify critical zones of interest (most potential locations) and addresses of those centroids are reverse geocoded to serve as starting points for final explorations by interested parties.
Final decisions on the optimal location will be made by shareholders based on specific characteristics of neighbourhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to public transportation, for example), noise levels, prices, and social dynamics pertaining to each neighbourhood.